Forming an Integrated Lexical Resource for Word Sense Disambiguation

نویسنده

  • Oi Yee Kwong
چکیده

This paper reports a full-scale linkage of noun senses between two existing lexical resources, namely WordNet and Roget's Thesaurus, to form an Integrated Lexical Resource (ILR) for use in natural language processing (NLP). The linkage was founded on a structurally-based sense-mapping algorithm. About 18,000 nouns with over 30,000 senses were mapped. Although exhaustive verification is impractical, we show that it is reasonable to expect some 70-80% accuracy of the resultant mappings. More importantly, the ILR, which contains enriched lexical information, is readily usable in many NLP tasks. We shall explore some practical use of the ILR in word sense disambiguation (WSD), as WSD notably requires a wide range of lexical information.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Relevant Domains Resource for Word Sense Disambiguation

This paper presents a new method for Word Sense Disambiguation based on the WordNet Domains lexical resource [4]. The underlaying working hypothesis is that domain labels, such as ARCHITECTURE, SPORT and MEDICINE provide a natural way to establish semantic relations between word senses, that can be used during the disambiguation process. This resource has already been used on Word Sense Disambi...

متن کامل

Generating Training Data for Semantic Role Labeling based on Label Transfer from Linked Lexical Resources

We present a new approach for generating role-labeled training data using Linked Lexical Resources, i.e., integrated lexical resources that combine several resources (e.g., WordNet, FrameNet, Wiktionary) by linking them on the sense or on the role level. Unlike resource-based supervision in relation extraction, we focus on complex linguistic annotations, more specifically FrameNet senses and ro...

متن کامل

Annotating WordNet

High-quality lexical resources are needed to both train and evaluate Word Sense Disambiguation (WSD) systems. The problem of ambiguity persists even in limited domains, thus the necessity for wide-coverage inventories of senses (dictionaries) and corpora sense-tagged to them. WordNet has been used extensively for WSD, for both its broad coverage and its large network of semantic relations. In t...

متن کامل

CASSAurus: A Resource of Simpler Spanish Synonyms

In this work we introduce and describe a language resource composed of lists of simpler synonyms for Spanish. The synonyms are divided in different senses taken from the Spanish OpenThesaurus, where context disambiguation was performed by using statistical information from the Web and Google Books Ngrams. This resource is freely available online and can be used for different NLP tasks such as l...

متن کامل

Uncertainty in data integration systems: automatic generation of probabilistic relationships

We propose a method for the automatic discovery of probabilistic relationships in the environment of data integration systems. Dynamic data integration systems extend the architecture of current data integration systems by modeling uncertainty at their core. Our method is a probabilistic word sense disambiguation (PWSD), which allows to automatically lexically annotate (i.e. annotation w.r.t. a...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2001